NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Using a units ontology to annotate pre-existing metadata

https://doi.org/10.1038/s41597-025-04587-8

Porter, John_H; O’Brien, Margaret; Frants, Marina; Earl, Stevan; Martin, Mary; Laney, Christine_M (February 2025, Scientific Data)

Abstract Automated processing of environmental data is hindered by the wide array of unit representations provided in the metadata of digital datasets. For example, gm/m2, g/m2, gm-2, g/m^2, g.m-2 and gramPerMeterSquared are all representations of a single complex unit that might be human-readable but are not machine-interpretable. Connectingad hocunits to a single unit concept in an ontology permits the identification of datasets sharing units and provides additional information regarding labels, definitions, dimensions and transformations provided in the ontology. Here we use successive string transformations to linkad hocunit representations to units in the QUDT ontology (e.g., unit: GM-PER-M2). Although only 896 of 7,110 distinct units in a corpus of ecological metadata from DataONE, the Environmental Data Initiative and the U.S. National Ecological Observatory Network were matched, 324,811 unit uses (instances) out of 355,057 of total unit uses were successfully mapped to QUDT units (91%). The resulting lookup table was used to enable a web service and R functions for adding annotation elements to Ecological Metadata Language documents.
more » « less
Managing linguistic obstacles in multidisciplinary, multinational, and multilingual research projects

https://doi.org/10.1371/journal.pone.0311967

Specht, Alison; Stall, Shelley; Machicao, Jeaneth; Catry, Thibault; Chaumont, Marc; David, Romain; Devillers, Rodolphe; Edmunds, Rorie; Jarry, Robin; Mabile, Laurence; et al (December 2024, PLOS ONE)
Afzaal, Muhammad (Ed.)
Environmental challenges are rarely confined to national, disciplinary, or linguistic domains. Convergent solutions require international collaboration and equitable access to new technologies and practices. The ability of international, multidisciplinary and multilingual research teams to work effectively can be challenging. A major impediment to innovation in diverse teams often stems from different understandings of the terminology used. These can vary greatly according to the cultural and disciplinary backgrounds of the team members. In this paper we take an empirical approach to examine sources of terminological confusion and their effect in a technically innovative, multidisciplinary, multinational, and multilingual research project, adhering to Open Science principles. We use guided reflection of participant experience in two contrasting teams—one applying Deep Learning (Artificial Intelligence) techniques, the other developing guidance for Open Science practices—to identify and classify the terminological obstacles encountered and reflect on their impact. Several types of terminological incongruities were identified, including fuzziness in language, disciplinary differences and multiple terms for a single meaning. A novel or technical term did not always exist in all domains, or if known, was not fully understood or adopted. Practical matters of international data collection and comparison included an unanticipated need to incorporate different types of data labels from country to country, authority to authority. Sometimes these incongruities could be solved quickly, sometimes they stopped the workflow. Active collaboration and mutual trust across the team enhanced workflows, as incompatibilities were resolved more speedily than otherwise. Based on the research experience described in this paper, we make six recommendations accompanied by suggestions for their implementation to improve the success of similar multinational, multilingual and multidisciplinary projects. These recommendations are conceptual drawing on a singular experience and remain to be sources for discussion and testing by others embarking on their research journey.
more » « less
Full Text Available
Enhancing the FAIRness of Arctic Research Data Through Semantic Annotation

https://doi.org/10.5334/dsj-2024-002

Chong, Steven S; Schildhauer, Mark; O’Brien, Margaret; Mecum, Bryce; Jones, Matthew B (January 2024, Data Science Journal)

The National Science Foundation’s Arctic Data Center is the primary data repository for NSF-funded research conducted in the Arctic. There are major challenges in discovering and interpreting resources in a repository containing data as heterogeneous and interdisciplinary as those in the Arctic Data Center. This paper reports on advances in cyberinfrastructure at the Arctic Data Center that help address these issues by leveraging semantic technologies that enhance the repository’s adherence to the FAIR data principles and improve the Findability, Accessibility, Interoperability, and Reusability of digital resources in the repository. We describe the Arctic Data Center’s improvements. We use semantic annotation to bind metadata about Arctic data sets with concepts in web-accessible ontologies. The Arctic Data Center’s implementation of a semantic annotation mechanism is accompanied by the development of an extended search interface that increases the findability of data by allowing users to search for specific, broader, and narrower meanings of measurement descriptions, as well as through their potential synonyms. Based on research carried out by the DataONE project, we evaluated the potential impact of this approach, regarding the accessibility, interoperability, and reusability of measurement data. Arctic research often benefits from having additional data, typically from multiple, heterogeneous sources, that complement and extend the bases – spatially, temporally, or thematically – for understanding Arctic phenomena. These relevant data resources must be 'found', and 'harmonized' prior to integration and analysis. The findings of a case study indicated that the semantic annotation of measurement data enhances the capabilities of researchers to accomplish these tasks.
more » « less
Full Text Available
The Value of a Data and Digital Object Management Plan (D(DO)MP) in Fostering Sharing Practices in a Multidisciplinary Multinational Project

https://doi.org/10.5334/dsj-2023-038

Specht, Alison; O’Brien, Margaret; Edmunds, Rorie; Corrêa, Pedro; David, Romain; Mabile, Laurence; Machicao, Jeaneth; Murayama, Yasuhiro; Stall, Shelley (October 2023, Data Science Journal)

Data Management Plans (DMP) are now a routine part of research proposals but are generally not referred to after funding is granted. The Belmont Forum requires an extensive document, a ‘Data and Digital Object Management Plan’ (D(DO)MP) for its awarded projects that is expected to be kept current over the life of the project. The D(DO)MP is intended to record team decisions about major tools and practices to be used over the life of the project for data and software stewardship, and for preservation of data and software products, aligned with the desired Open Science outcomes relevant to the project. Here we present one of the first instances of the use of Belmont’s D(DO)MP through a case study of the PARSEC project, a multinational and multidisciplinary investigation of the socioeconomic impacts of protected areas. We describe the development and revision of our interpretation of the D(DO)MP and discuss its adoption and acceptance by our research group. We periodically assessed the data management sophistication of team members and their use of the various nominated tools and practices. As a result, for example, we included summaries to enable the key components of the D(DO)MP to be readily viewed by the researcher. To meet the Open Science outcomes in a complex project like PARSEC, a comprehensive and appropriately structured D(DO)MP helps project leaders (a) ensure that team members are committed to the collaboration goals of the project, (b) that there is regular and effective feedback within the team, (c) training in new tools is provided as and when needed, and (d) there is easy access to a short reference to the tools and descriptions of the nominated practices.
more » « less
Full Text Available
Earth Science Data Repositories: Implementing the CARE Principles

https://doi.org/10.5334/dsj-2024-037

O’Brien, Margaret; Duerr, Ruth; Taitingfong, Riley; Martinez, Andrew; Vera, Lourdes; Jennings, Lydia L; Downs, Robert R; Antognoli, Erin; Brink, Talya Ten; Halmai, Nicole B; et al (January 2024, Data Science Journal)

Datasets carry cultural and political context at all parts of the data life cycle. Historically, Earth science data repositories have taken their guidance and policies as a combination of mandates from their funding agencies and the needs of their user communities, typically universities, agencies, and researchers. Consequently, repository practices have rarely taken into consideration the needs of other communities such as the Indigenous Peoples on whose lands data are often acquired. In recent years, a number of global efforts have worked to improve the conduct of research as well as data policy and practices by the repositories that hold and disseminate it. One of these established the CARE Principles for Indigenous Data Governance (Carroll et al. 2020), representing ‘Collective Benefit’, ‘Authority to Control’, ‘Responsibility’, and ‘Ethics”’ hosted by the Global Indigenous Data Alliance (GIDA 2023a). In order to align to the CARE Principles, repositories may need to update their policies, architecture, service offerings, and their collaboration models. The question is how? Operationalizing principles into active repositories is generally a fraught process. This paper captures perspectives and recommendations from many of the repositories that are members of the Earth Science Information Partners (ESIPFed, n.d.) in conjunction with members of the Collaboratory for Indigenous Data Governance (Collaboratory for Indigenous Data Governance n.d.) and GIDA, defines and prioritizes the set of activities Earth and Environmental repositories can take to better adhere to CARE Principles in the hopes that this will help implementation in repositories globally.
more » « less
Full Text Available
A Deep-Learning Method for the Prediction of Socio-Economic Indicators from Street-View Imagery Using a Case Study from Brazil

https://doi.org/10.5334/dsj-2022-006

Machicao, Jeaneth; Specht, Alison; Vellenich, Danton; Meneguzzi, Leandro; David, Romain; Stall, Shelley; Ferraz, Katia; Mabile, Laurence; O’Brien, Margaret; Corrêa, Pedro (January 2022, Data Science Journal)

Full Text Available

Search for: All records